03_Networks

Authors: Diego Senso González, Luis Vaciero
15 january 2021
Module: Machine Learning - Master's Degree in Data Science for Finance

Objective

The purpose of this document is to visualize the data available through networks.

Libraries

Firstly, we import the requested libraries.

Loading the data

We remove the first column of each imported dataset, which contains no data.

Initial view

To begin, we visualize a simple network, selecting the arrival and departure points of the Bicimad dataset (which contains all the trips made).

Clearly the result is not very explanatory. There are many points of origin and arrival, and many journeys. One can intuit areas where more lines accumulate, which could indicate more transit between those points. However, this is only a first approximation.

Street Networks

First of all, we visualize all the connections between streets of the bicycle routes. Given the high number of streets and trips, the network is not very explanatory. Each blue dot represents a street, and the lines that overlap each other are the trips made.

Neighbourhood Networks

We went on to observe the journeys by neighborhood. In this graph we can already see more clearly the lines that connect the points. You can also see that each neighborhood has connections with a large number of different neighborhoods, in addition to itself.

Selecting only the neighborhoods of origin, we can proceed to make another type of visualization with networks. In this case, from each node multiple edges come out to other points in concentrated groups. Each node separated from the rest is an origin of trips, while the nodes that are grouped are each of the destinations. This is another way of visualizing the network, focusing the display in this case on the trip origin points.

District Networks

Now we move on to visualize by districts. This will be the clearest visualization, since the number of districts in the city of Madrid is low compared to the number of different streets and neighborhoods. You can start to see that it is a directed graph, since the edges that connect the points have directions. These edges are usually bidirectional, since there are bicycle routes in both directions.

A peculiarity of these networks with respect to others that are made with other data, is that in these seem all the nodes connected between them. This happens because between almost all the combinations of nodes there is a certain number of bicycle trips over a year.

After that visualization, we were interested in getting a graph that showed the number of trips that went from one district to another. so we manually included all possible combinations between districts. Then we placed the number of trips as the weight of each of those connections.

The result is as follows. In this case, the connections between districts are also displayed, but in addition the number of paths that exist in each connection is shown on each edge.

Graph of each of the Districts

Finally, we want to represent the trips from each district.

CENTRO

ARGANZUELA

RETIRO

SALAMANCA

CHAMARTÍN

TETUÁN

CHAMBERÍ

MONCLOA-ARAVACAÍ